Search CORE

1,228 research outputs found

Recommended from our members

Auditory Spectrum-Based Pitched Instrument Onset Detection

Author: Benetos E.
Stylianou Y.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2010
Field of study

In this paper, a method for onset detection of music signals using auditory spectra is proposed. The auditory spectrogram provides a time-frequency representation that employs a sound processing model resembling the human auditory system. Recent work on onset detection employs DFT-based features describing spectral energy and phase differences, as well as pitch-based features. These features are often combined for maximizing detection performance. Here, the spectral flux and phase slope features are derived in the auditory framework and a novel fundamental frequency estimation algorithm based on auditory spectra is introduced. An onset detection algorithm is proposed, which processes and combines the aforementioned features at the decision level. Experiments are conducted on a dataset covering 11 pitched instrument types, consisting of 1829 onsets in total. Results indicate that auditory representations outperform various state-of-the-art approaches, with the onset detection algorithm reaching an F-measure of 82.6%

City Research Online

Crossref

Recommended from our members

Pitched Instrument Onset Detection based on Auditory Spectra

Author: Benetos E.
Holzapfel A.
Stylianou Y.
Publication venue: International Society for Music Information Retrieval
Publication date: 01/01/2009
Field of study

City Research Online

Combining perceptually-motivated spectral shaping with loudness and duration modification for intelligibility enhancement of HMM-based synthetic speech in noise

Author: King S.
Stylianou Y.
Valentini-Botinhao C.
Yamagishi J.
Publication venue
Publication date: 01/08/2013
Field of study

Edinburgh Research Explorer

Expressive visual text to speech and expression adaptation using deep neural networks

Author: Cipolla R
Maia R
Parker J
Stylianou Y
Publication venue: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Publication date: 16/06/2017
Field of study

In this paper, we present an expressive visual text to speech system (VTTS) based on a deep neural network (DNN). Given an input text sentence and a set of expression tags, the VTTS is able to produce not only the audio speech, but also the accompanying facial movements. The expressions can either be one of the expressions in the training corpus or a blend of expressions from the training corpus. Furthermore, we present a method of adapting a previously trained DNN to include a new expression using a small amount of training data. Experiments show that the proposed DNN-based VTTS is preferred by 57.9% over the baseline hidden Markov model based VTTS which uses cluster adaptive training

Crossref

Apollo (Cambridge)

Tensin1 expression and function in chronic obstructive pulmonary disease

Author: Amrani Y.
Bradding Peter
Brightling C. E.
Clark K.
Gooptu B.
Roach K.M.
Smallwood Dawn T.
Stylianou P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/11/2019
Field of study

open access articleChronic obstructive pulmonary disease (COPD) constitutes a major cause of morbidity and mortality. Genome wide association studies have shown significant associations between airflow obstruction or COPD with a non-synonymous SNP in the TNS1 gene, which encodes tensin1. However, the expression, cellular distribution and function of tensin1 in human airway tissue and cells are unknown. We therefore examined these characteristics in tissue and cells from controls and people with COPD or asthma. Airway tissue was immunostained for tensin1. Tensin1 expression in cultured human airway smooth muscle cells (HASMCs) was evaluated using qRT-PCR, western blotting and immunofluorescent staining. siRNAs were used to downregulate tensin1 expression. Tensin1 expression was increased in the airway smooth muscle and lamina propria in COPD tissue, but not asthma, when compared to controls. Tensin1 was expressed in HASMCs and upregulated by TGFβ1. TGFβ1 and fibronectin increased the localisation of tensin1 to fibrillar adhesions. Tensin1 and α-smooth muscle actin (αSMA) were strongly co-localised, and tensin1 depletion in HASMCs attenuated both αSMA expression and contraction of collagen gels. In summary, tensin1 expression is increased in COPD airways, and may promote airway obstruction by enhancing the expression of contractile proteins and their localisation to stress fibres in HASMCs

De Montfort University Open Research Archive

Leicester Research Archive

Bird detection in audio : a survey and a challenge

Author: Glotin H
IEEE
Stowell D
Stylianou Y
Wood M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/08/2016
Field of study

Many biological monitoring projects rely on acoustic detection of birds. Despite increasingly large datasets, this detection is often manual or semi-automatic, requiring manual tuning/postprocessing. We review the state of the art in automatic bird sound detection, and identify a widespread need for tuning-free and species-agnostic approaches. We introduce new datasets and an IEEE research challenge to address this need, to make possible the development of fully automatic algorithms for bird sound detection

arXiv.org e-Print Archive

University of Salford Institutional Repository

Crossref

Queen Mary Research Online

Robust excitation-based features for Automatic Speech Recognition

Author: Chen L
Chen X
Drugman T
Gales MJF
Stylianou Y
Publication venue: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publication date: 01/01/2015
Field of study

In this paper we investigate the use of robust to noise features characterizing the speech excitation signal as complementary features to the usually considered vocal tract based features for automatic speech recognition (ASR). The features are tested in a state-of-the-art Deep Neural Network (DNN) based hybrid acoustic model for speech recognition. The suggested excitation features expands the set of excitation features previously considered for ASR, expecting that these features help in a better discrimination of the broad phonetic classes (e.g., fricatives, nasal, vowels, etc.). Relative improvements in the word error rate are observed in the AMI meeting transcription system with greater gains (about 5%) if PLP features are combined with the suggested excitation features. For Aurora 4, significant improvements are observed as well. Combining the suggested excitation features with filter banks, a word error rate of 9.96% is achieved.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/ICASSP.2015.717885

CiteSeerX

Crossref

Apollo (Cambridge)

CUED - Cambridge University Engineering Department

Dysphonia Detection based on modulation spectral features and cepstral coefficients

Author: Arias Londoño Julian
Godino Llorente Juan Ignacio
Markaki Maria
Stylianou Y.
Publication venue: E.U.I.T. Telecomunicación (UPM)
Publication date: 01/06/2010
Field of study

In this paper, we combine modulation spectral features with mel-frequency cepstral coefficients for automatic detection of dysphonia. For classification purposes, dimensions of the original modulation spectra are reduced using higher order singular value decomposition (HOSVD). Most relevant features are selected based on their mutual information to discrimination between normophonic and dysphonic speakers made by experts. Features that highly correlate with voice alterations are associated then with a support vector machine (SVM) classifier to provide an automatic decision. Recognition experiments using two different databases suggest that the system provides complementary information to the standard mel-cepstral feature

Archivo Digital UPM

A fixed dimension and perceptually based dynamic sinusoidal model of speech

Author: Hu Qiong
Latorre Javier
Maia Ranniery
Richmond Korin
Stylianou Y.
Yamagishi Junichi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2014
Field of study

This paper presents a fixed- and low-dimensional, perceptually based dynamic sinusoidal model of speech referred to as PDM (Perceptual Dynamic Model). To decrease and fix the number of sinusoidal components typically used in the standard sinusoidal model, we propose to use only one dynamic sinusoidal component per critical band. For each band, the sinusoid with the maximum spectral amplitude is selected and associated with the centre frequency of that critical band. The model is expanded at low frequencies by incorporating sinusoids at the boundaries of the corresponding bands while at the higher frequencies a modulated noise component is used. A listening test is conducted to compare speech reconstructed with PDM and state-of-the-art models of speech, where all models are constrained to use an equal number of parameters. The results show that PDM is clearly preferred in terms of quality over the other systems. Index Terms — Sinusoidal Model, Critical band, Vocoder 1

CiteSeerX

Crossref

Edinburgh Research Explorer

Selective CO₂ capture in metal-organic frameworks with azine-functionalized pores generated by mechanosynthesis

Author: Ahmed A.
Ali Morsali
An J.
An J.
Angamuthu R.
Bae Y.-S.
Beldon P. J.
Bennett T. D.
Bhattacharya B.
Burd S. D.
Cliffe M. J.
Corma A.
Daniel Maspoch
Demessence A.
Dincǎ M.
D’Alessandro D. M.
Friscic T.
Friscic T.
Friščić T.
Fujii K.
Gao W.-Y.
Gassensmith J. J.
Kyriakos C. Stylianou
Lan A.
Li J. R.
Li T.
McDonald T. M.
Mohammad Yaser Masoomi
Nugent P.
Pascal Retailleau
Petit J. R.
Pramanik S.
Ramsahye N. A.
Spek A. L.
Stylianou K. C.
Stylianou K. C.
Sumida K.
Vaidhyanathan R.
Vaidhyanathan R.
Wu H.
Xue D. X.
Yang S.
Yuan B.
Yuan W.
Yuan W. B.
Zhang Z.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2014
Field of study

Two new three-dimensional porous Zn(II)-based metal-organic frameworks, containing azine-functionalized pores, have been readily and quickly isolated via mechanosynthesis, by using a nonlinear dicarboxylate and linear N-donor ligands. The use of nonfunctionalized and methyl-functionalized N-donor ligands has led to the formation of frameworks with different topologies and metal-ligand connectivities and therefore different pore sizes and accessible volumes. Despite this, both metal-organic frameworks (MOFs) possess comparable BET surface areas and CO₂ uptakes at 273 and 298 K at 1 bar. The network with narrow and interconnected pores in three dimensions shows greater affinity for CO compared to the network with one-dimensional and relatively large pores-attributable to the more effective interactions with the azine groups

Crossref

Diposit Digital de Documents de la UAB

Digital.CSIC

FigShare